Skip to content

feat(datafusion): Implement IcebergWriteExec for DataFusion write support #1585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Aug 12, 2025

Conversation

CTTY
Copy link
Contributor

@CTTY CTTY commented Aug 6, 2025

Which issue does this PR close?

What changes are included in this PR?

  • Added IcebergWriteExec to write the input execution plan to parquet files, and returns serialized data files

Are these changes tested?

added ut

@CTTY CTTY marked this pull request as ready for review August 7, 2025 00:23
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CTTY for this pr, in generally look good! Just one minor nit.

}

impl IcebergWriteExec {
pub fn new(table: Table, input: Arc<dyn ExecutionPlan>, schema: ArrowSchemaRef) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point is that we should ensure that the input schema matches table's schema, otherwise we are doing schema evolution during write.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Columns nullability and field type would be checked within execute_input_stream when it's binding the Iceberg table schema to the input RecordBatch. So we don't need to worry about it now.

This may prevent us from doing any forms of schema evolution, but I think that's a separate issue

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CTTY for this pr, LGTM!

@liurenjie1024 liurenjie1024 merged commit bc469c3 into apache:main Aug 12, 2025
18 checks passed
@CTTY CTTY deleted the ctty/df-write-node branch August 12, 2025 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Writer Node: Spawn Iceberg writers and write the input data
2 participants